



## MANIPAL INSTITUTE OF TECHNOLOGY





SIXTH SEMESTER B.TECH. (CSE) DEGREE END SEMESTER EXAMINATION MAY 2014
PARALLEL COMPUTER ARCHITECTURE AND PROGRAMMING(CSE 306)

DATE:9-5-2014
TIME: 3 HOURS MAX.MARKS: 50

## **Instructions to Candidates**

- Answer any five full questions.
- 1A. Consider 4-pipeline stages, IF, ID, OF and EX are arranged into a linear cascade . How do you differentiate between overlapped instruction execution and sequentially non-overlapped instruction execution with respect to their space-time diagram?
- 1B. Explain following with respect to performance of parallel processing
  - a. CPU Throughput
- b. CPI
- c. Efficiency
- 1C. With neat diagram explain the functional structure of pipeline computer handling scalar and vector data and MIMD multiprocessor. What is the role of SIMD array processor?

(3+3+4)

2A. Consider the following reservation table.

|            | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|------------|---|---|---|---|---|---|---|
| <b>S</b> 1 | X |   |   |   |   | X |   |
| S2         |   |   | X |   |   |   | X |
| S3         |   | X |   | X |   |   |   |
| S4         |   |   | X |   | X |   |   |

- i. Determine latencies in the forbidden list F and the collision vector C.
- ii. Draw the state diagram and list all the simple cycles.
- iii. Identify greedy cycles among the simple cycles.
- iv. What is the MAL of this pipeline?
- v. What is the efficiency of this pipeline?

- 2B. Explain Feng's classification with an example.
- 2C. Explain different types of barrier operations for a command queue.

(5+3+2)

3A. Consider 2 matrices of size 3x3. Simulate parallel matrix multiplication using SIMD operation.

Show all memory allocation for SIMD matrix multiplication.

3B. Consider 5 Processing elements (PEs) with following values. Explain SIMD addition .Give masking scheme for the same. Explain masking scheme used in step 2 of SIMD operation.

| PE0 | PE1 | PE2 | PE3 | PE4 |
|-----|-----|-----|-----|-----|
| 312 | 414 | 641 | 17  | 2   |

3C.

| <u>Stream</u> |
|---------------|
| add a,b,c     |
| add d,b,e     |
| mul f, a, e   |
| add g,d,a     |
| mul h ,g , k  |
| fadd a, b, c  |
| fmul d, k, e  |
| fmul f,d,s    |
| add g,b ,d    |
| mul f, g, m   |
| fadd h, g , d |
|               |
|               |

Show the instruction dependency and scheduling scheme which are done using superscalar processor when 3 integer ALUs and 2 Floating point ALUs are available.

(4 + 2 + 4)

4A. Write MPI program to read a matrix of size 3x3 and enter an element to be searched in the input matrix through the root process, and find its occurrence in the input matrix through the root process. Root process distribute equally the search work among all processes with ranks 0,1 and 2 respectively(3 elements each). Once all three processes finish the counting of the occurrence, the root process gathers the partial count computations from all the processes including itself. The root process displays the number of occurrences. Meanwhile, the no. of occurrences replace the principal diagonal of the input matrix by the root process.

Example:

Enter the matrix :

1 2 7

4 5 6

779

Enter the element to be searched: 7

No. of occurrence of 7: 3

Resultant matrix:

**3** 2 7

4 3 6

7 7 3

4B. Explain all collective communications with respect to message passing interface.

4C. Explain different specification models used in OpenCL.

(5+3+2)

- 5A. Write a complete OpenCL program to subtract 2 vectors each of size 2048 of integers.
- 5B. Write the kernel code for above OpenCL program.

(8 + 2)

- 6A. Explain the following
  - i. Uniform memory access architecture
  - ii. Distributed memory multiprocessor
- 6B. What is cache coherence problem? Explain 2 basic options followed while writing to cache. ((2+3)+5)